This idea extends to three-dimensional data. In the first example (shown below), a line (one dimension) can provide a good representation of the orginal data:
While this second example, a line provides a poor representation, but a plane (two dimensions) provides a good representation:
The notation shown below defines the first principal component, \(Z_1\), as linear combination of loadings (coefficients) \(\{\phi_{11}, \phi_{21}, \ldots \phi_{p1}\}\) which map the original variables, \(X_1, X_2, \ldots, X_P\), to a score (location) in the PC1 dimension \[Z_1 = \phi_{11} X_1 + \phi_{21} X_2 + \ldots + \phi_{p1} X_p\]
\[Z_1 = \phi_{11} X_1 + \phi_{21} X_2 + \ldots + \phi_{p1} X_p\] \[Z_2 = \phi_{12} X_1 + \phi_{22} X_2 + \ldots + \phi_{p2} X_p\]
data("USArrests")
X <- USArrests
X <- scale(X)
1/(nrow(X) - 1) * t(X) %*% X ## Calculating it ourselves## Murder Assault UrbanPop Rape
## Murder 1.00000000 0.8018733 0.06957262 0.5635788
## Assault 0.80187331 1.0000000 0.25887170 0.6652412
## UrbanPop 0.06957262 0.2588717 1.00000000 0.4113412
## Rape 0.56357883 0.6652412 0.41134124 1.0000000
cov(X) ## Using the built-in "cov" function## Murder Assault UrbanPop Rape
## Murder 1.00000000 0.8018733 0.06957262 0.5635788
## Assault 0.80187331 1.0000000 0.25887170 0.6652412
## UrbanPop 0.06957262 0.2588717 1.00000000 0.4113412
## Rape 0.56357883 0.6652412 0.41134124 1.0000000
\[\mathbf{C} = \mathbf{V}\mathbf{L}\mathbf{V}^T\]
C <- cov(X)
eigen(C)## eigen() decomposition
## $values
## [1] 2.4802416 0.9897652 0.3565632 0.1734301
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] -0.5358995 0.4181809 -0.3412327 0.64922780
## [2,] -0.5831836 0.1879856 -0.2681484 -0.74340748
## [3,] -0.2781909 -0.8728062 -0.3780158 0.13387773
## [4,] -0.5434321 -0.1673186 0.8177779 0.08902432
R contains a built-in function to perform PCA without requiring so many stepsX <- USArrests
X <- scale(X) ## Standardize
prcomp(X) ## Perform PCA## Standard deviations (1, .., p=4):
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## Murder -0.5358995 0.4181809 -0.3412327 0.64922780
## Assault -0.5831836 0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773
## Rape -0.5434321 -0.1673186 0.8177779 0.08902432
## Standard deviations (1, .., p=9):
## [1] 1.9954525 1.0788358 0.8867900 0.6396917 0.4662448 0.3487425 0.3202376
## [8] 0.2513177 0.1671921
##
## Rotation (n x k) = (9 x 9):
## PC1 PC2 PC3 PC4 PC5
## admissionRate -0.1804793 -0.03538274 0.92780641 -0.31894586 0.01308557
## ACTmath 0.4831079 0.03595880 0.05275053 -0.02202215 -0.20376965
## ACTenglish 0.4258063 0.08076983 0.07204367 -0.09362101 -0.12816934
## undergrads 0.1336508 -0.61270224 0.19275286 0.49993205 -0.48714672
## cost 0.2799760 0.40436858 -0.03533647 -0.34856772 -0.50273913
## gradRate 0.4027967 0.14617155 0.19023908 0.38344935 0.26046245
## FYretention 0.4221292 0.10774438 0.16251349 0.14930097 0.51048338
## fedloan -0.3240619 0.44623432 0.11016604 0.49268655 -0.01230583
## debt -0.1049739 0.46894908 0.13439471 0.32485065 -0.35104894
## PC6 PC7 PC8 PC9
## admissionRate -0.04215275 0.01827869 0.009742059 -0.03531274
## ACTmath -0.40686339 -0.30828053 -0.028677929 -0.67758906
## ACTenglish -0.36307868 -0.16340715 -0.424831305 0.66541190
## undergrads 0.24316410 0.09309020 -0.118037660 0.02466435
## cost 0.40893056 0.45564041 -0.053956579 -0.06972096
## gradRate -0.24096830 0.50906734 0.485465178 0.11149885
## FYretention 0.60484531 -0.25016313 -0.257837343 -0.07748462
## fedloan -0.16700746 0.22824611 -0.569676237 -0.19055711
## debt 0.15070219 -0.53649390 0.418412811 0.19140498
undergrads is negative on the previous slide).